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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

In re patent application of 

Frederick J. Damerau and David Johnson 

Serial No, 09/605,709 Group Art Unit 2654 

Filed June 27, 2000 Examiner Abul K. Azad 

For AUTOMATED SET UP OF WEB-BASED NATURAL LANGUAGE 
INTERFACE 

Mail Stop Appeal Brief - Patents 
Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 



RECEIVED 
CENTRAL BW CENTER 

APR 1 1.2005 



REPLY BRIEF OF APPELLANTS UNDER 37 C,F.R. §1.193(b) 

Sir: 

The Examiner mailed an Answer on February 10, 2005 responsive to 
Appelleant's Supplemental Appeal Brief. Please charge Deposit account 50-0510 of 
International Business Machines Coiporaiion (IBM-Yorktown) in the amount of 
$500.00 (37 C.F.R. 1 ,1 7(c)) to cover the fee for filing this reply brief 



FURTHER ARGUMENT 

The Examiner argues in his Answer that Sarukkai teaches sparse n-graums- For 
the reasons that follow, the applicant respectfully disagrees with this conclusion. 

It is important to understand what it is that Saiuklcai discloses. Sarukkai 
discloses an improved speech interface to the Internet, The Internet is not like other 
speech recognition domains, such as when a person reads text or speaks 
spontaneously. The Intemet presents a different problem jfrom other speech 
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recognition domains because of the huge vocabulary of the Internet. As with other 
speech recognition domains, the user is trained on a vocabulary, thereby enablbig the 
speech recognition system to use the sooken representation to search for and find the 
appropriate textual renresentation . This is done Avithin a model space, and n-grams 
are conmionly used as the language model for this space. 

But for a user trying to use voice commands to search the Internet, the size of 
the Internet vocabulary means an increased likelihood that the user wall sneak using 
words which are "out-of-vocabulary" in the terminology of voice recognition. The 
problem being addressed by Sarukkai is finding the textual representations (i.e. text 
words, firom a vocabulary) that match the spoken representation . This is done by 
searching, but not the kind of searching that is being done by the user who is 
using voice commands to navigate the Internet. The "speech recognition search" 
specified by Sarukkai (coL 2, lines 64-65; col 3, line 2; col. 7, line 21) is trying to 
find the corresponding text for spoken words. 

However, as Sarukkai states explicitly, this "search" for the optimal word 
sequence W (i.e. textual representation) corresponding to a given acoustic observation 
X combines acoustic scores (for acoustic observation X) and language scores (for the 
possible word sequence W) in an evaluation function (coL 4, lines 55-65). This is a 
higWy mathematical definition of a ''search" - but is well xmderstood in the art of 
speech recognition. In non-mathematical terms, what is going on here is the search 
for a best fit out of possible text word sequences to a given voice sequence. The 
process starts from the observed voice sequence, which is given an acoustic score, and 
then evaluates possible text word sequences (each having a language score) in 
combination with this acoustic score using an evaluation function, ending up with a 
best fit. Sarukkai' s contribution to this voice recognition evaluation is to bias the 
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evaluation funcrion towards 'the set of words that are present in the web page (or that 
specify links) currently being viewed" (coL 5, lines 7-14). The results of this bias is 
to improve the ability of the voice recognition system to correctly identify the web 
page desked from 45% to 75%, as shown in Table 2 of Sarukkai. 

The distinction between the present invention and Sarukkai may be better 
understood by considering an example that appears in both disclosures, and by 
considering the significance of n-grams, and in particular "sparse n-grams", in dealing 
with this example. As will be clear from what follows, Sarukkai takes a completely 
different approach to the example. The example is discussed in the background 
section of Sarukkai (col. 3, lines 30-36), in connection with the limitations of certain 
prior art approaches. Suppose that the phrase "THE CURRENT STOCK QUOTES" 
is the only link in the page, and the user speaks tlie words 'THE CURRENT STOCK 
PRICE QUOTES." If the word "PRICE" is missing from the voice recognition 
vocabulary, the prior art method (RGDAG) would fail to make the connection. 

Now consider how Sarukkai characterizes and solves this problem. Sarukkai 
notes that a proper method of handling these "out of vocabulary" (OOV) words is not 
available and the problem is even more prominent in the context of web surfing (coi. 
2, lines 1-16). Significantly, Sarukkai notes in the background section that ""'it is 
necessary to get the in-vocabulary words correct in the presence of OOV words" (col 
2, lines 14-16). This is significant because this is precisely the effect of the biasing 
technique used by Sarukkai* which boosts the probabilities of mdiyidual words 
belonging to the web triggered word set (col. 10, lines 26-31; emphasis supplied). 
There is nothing indicated for the sequence of words in the web triggered word set: it 
is the probabilities of individnal words that are boosted in the evaluation function. 
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Thus, this method of building on a language model (col. 10, lines 23-24) easily caters 
to integration with n-grams (col. 10^ line 35), 

Now consider the present invention. As stated in the Supplemental Appeal 
Brief (at page 9), the term "sparse n-gram" refers to the sequence of words allowing 
for gaps between words making up the n-gram. The gaps are limited by establishing a 
distance d which is the maximum separation between the first and last words of the 
n-gram. Thus, the above example "THE CURRENT STOCK PRICE QUOTES" may 
be represented as the 4-gram *THE CURRENT STOCK QUOTES" with a d value of 
4 instead of 3. 

The Examiner argues that "sparse n-gram" has no meaning because it is 
claimed using the language '^vherein the n-graras may be sparse or non-sparse." The 
Examiner argues that the phrase *^sparse or non-sparse" i$ inclusive and therefore any 
kind of n-grams can be read on the claimed limitations. This is a slight of hand 
argument It is clear from the disclosure of the present invention that the inventors 
have defined the term "sparse n-gram" with its distance dto allow for gaps. The n- 
grams referred to in Sarukkai are the conventional n-grams where d — n - 1^ i.e, 
there are no gaps. The Examiner* s slight of hand argument essentially tries to read 
the claim language out of the claim. This cannot rightly be done. If sparse n-grams 
have a gap, i.e. d> n- then the language "sparse or non-sparse" means d > n- L 
In essence, by claiming "sparse or non-sparse" the clear meaning of the inventors is to 
use an expanded notion of n-grams, including *'sparse n-grams" which allow for gaps. 
While it is possible to rephrase the language to say "wherein the n-grams include 
sparse n-grams" the meaniiag would not change. 

Further, the Examiner cites the teaching from Sarukkai regarding "a set of 
words selectively extracted Scorn the Web page source" (col. 7, lines 65-66) as 
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referring to a sparse n-gram, but this is incorrect. The cited reference is not to an n- 
gram but rather to the above described "set of words that are present in the web p^e" 
and used to bias the probabiUties of individual words in the evaluation function. 
Consequently, as Sarukkai makes clear, this "set of words'' is not an n-gram (sparse or 
otherwise) but rather is a collection of Individual words for a boosting approach that 
builds upon the n-gram language model. 

The Examiner attempts to obtain further mileage from the "set of words"' used 
by Sarukkai for biasing as described above. The Examiner argues that this "set of 
words" amounts to a "natural language interface" as claimed by the invention. It is 
manifest that the Examiner's suggestion in this regard cannot stand. The "set of 
words" is a collection of words used uidividuallv for biasing the voice recognition 
evaluation function. Although these words are taken from a web page, which the user 
can observe visually, any function they have on the web page because of their 
sequence or arrangement is lost when they are collected into a mere "set of words" for 
biasing puiposes. They have no function in Sarukkai relating to an interface, much 
less a natural language interfece. Indeed, Sarukkai provides a "speech interface" (col. 
1 , line 16) to information on the Internet, whereas the present invention is concerned 
not with speech mediated by a voice recognition system but with natural language as 
that term is used by those skilled in the computer ans, namely, to distinguish 
languages used by humans for general-purpose communication from constructs such 
as computer programming languages or the languages used in the study of formal 
logic. 



PAGE 7/8*RCVDAT4f11/200510:24:13PM [Eastern DaylightM^ 



APR-1 1-2005 21:07 FROWK 

YOR920000324US1 



-6- 



09/605,709 



T-491 P. 008/008 F- 
00280643aa 



CONCLUSION 



The issue for resolution iix this appeal is whether claims 1 to 6 are anticipated 
by U.S, Patent No. 5,819^0 to Samkkai under the objective standards of 35 U.S.C. 
§ 102(b), The Supplemental Appeal Brief, further supplemented by the foregoing 
arguments, demonstrates that Samkkai does not anticipate each and every element of 
the claimed invention. 

In view of the foregoing, it is respectfully submitted that the final rejection of 
claims 1-6 is in error. Accordingly, reversal of the fmal rejection is respectfully 
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Respectfully submitted. 




Clyde R Christofferson 
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Attorney for Applicants 



